Updated Pseudo-seq Protocol for Transcriptome-Wide Detection of Pseudouridines

Pseudouridine (Ψ), the most prevalent modified base in cellular RNAs, has been mapped to numerous sites not only in rRNAs, tRNAs, and snRNAs but also mRNAs. Although there have been multiple techniques to identify Ψs, due to the recent development of sequencing technologies some reagents are not compatible with the current sequencer. Here, we show the updated Pseudo-seq, a technique enabling the genome-wide identification of pseudouridylation sites with single-nucleotide precision. We provide a comprehensive description of Pseudo-seq, covering protocols for RNA isolation from human cells, library preparation, and detailed data analysis procedures. The methodology presented is easily adaptable to any cell or tissue type with high-quality mRNA isolation. It can be used for discovering novel pseudouridylation sites, thus constituting a crucial initial step toward understanding the regulation and function of this modification. Key features • Identification of Ψ sites on mRNAs. • Updated Pseudo-seq provides precise positional and quantitative information of Ψ. • Uses a more efficient library preparation with the latest, currently available materials.

This protocol is used in: Molecular Cell (2023), DOI: 10.1016/j.molcel.2023.01.009Pseudouridine (Ψ), the most prevalent modified base in cellular RNAs, has been mapped to numerous sites not only in rRNAs, tRNAs, and snRNAs but also mRNAs.Although there have been multiple techniques to identify Ψs, due to the recent development of sequencing technologies some reagents are not compatible with the current sequencer.Here, we show the updated Pseudo-seq, a technique enabling the genome-wide identification of pseudouridylation sites with single-nucleotide precision.We provide a comprehensive description of Pseudo-seq, covering protocols for RNA isolation from human cells, library preparation, and detailed data analysis procedures.The methodology presented is easily adaptable to any cell or tissue type with high-quality mRNA isolation.It can be used for discovering novel pseudouridylation sites, thus constituting a crucial initial step toward understanding the regulation and function of this modification.

Background
Many genetic diseases are caused by various mutations in specific disease genes.A significant proportion (~15%) of these mutations are nonsense mutations that create a premature termination codon (PTC) [1,2].Consequently, the nonsense-mediated mRNA decay (NMD) surveillance pathway degrades a large fraction of PTC-containing mRNA [3].Translation of the remaining undegraded PTC-containing mRNA terminates at the PTC, leading to no production of full-length protein and hence disease.Thus, suppressing NMD and translation termination at PTCs has become an attractive strategy for combating these diseases.To address diseases caused by nonsense mutations in particular genes, substantial efforts have focused on altering PTC-containing mRNA associated with the condition.This alteration, occurring at the RNA and not DNA level, aims to convert the PTC back into a sense codon [4].Inspired by this concept and considering the distinct chemical properties of pseudouridine (Ψ) compared to uridine, we have introduced a pioneering approach termed RNA-guided RNA pseudouridylation (U-to-Ψ conversion) [5,6].This strategy targets the uridine within a PTC (UAA, UAG, or UGA), effectively inhibiting nonsense-mediated decay (NMD) while facilitating PTC read-through, leading to the production of a full-length functional protein in the cell.Our observations in yeast cells demonstrate a substantial increase in nonsense read-through upon converting the invariant U of a PTC into a Ψ [5,7].The targeting of nonsense codons in yeast involves the expression of a designer box H/ACA guide RNA (gRNA), which possesses the capability to site-specifically direct the conversion of U to Ψ within the nonsense codon [8].Box H/ACA gRNAs, abundant in archaea and eukaryotes, naturally direct pseudouridylation of rRNAs, snRNAs, and mRNAs in eukaryotes at specific sites [9][10][11][12].Existing in the cell as a ribonucleoprotein complex (box H/ACA RNP), each box H/ACA gRNA directs site-specific pseudouridylation via distinctive base-pairing between the gRNA guide sequence and the substrate RNA [13].
Based on these observations, we have recently developed a novel approach, namely targeted PTC pseudouridylation, to suppress nonsense mutations in human cells [14].By co-transfecting human cells with a designer box H/ACA gRNA gene targeting the PTC, we showed that targeted pseudouridylation suppressed both NMD and translation termination at PTCs.Targeted pseudouridylation appears to be the first RNA-directed gene-specific therapeutic approach that suppresses NMD and concurrently promotes PTC read-through.To rule out the off-target effects of the gRNA transfection, we designed and performed Pseudo-seq to detect transcriptome-wide pseudouridylation.Recently, a number of Ψs have been predicted and experimentally detected by next-generation sequencing techniques [11,12,15,16].In these techniques, RNA is first treated with carbodiimide N-cyclohexyl-N-(2-morpholinoethyl)carbodiimide metho-p-toluenesulfonate (CMC), which forms covalent adducts with the bases in guanidine, uridine, and Ψ [17].Subsequently, alkaline hydrolysis removes the adducts from guanidine and uridine, but the CMC adduct at the N3 position of Ψ is resistant.The remaining CMC adduct on Ψ bases is an effective barrier to reverse transcriptase, which terminates one nucleotide before the modified Ψ.By mapping these strong reverse-transcription stop sites globally, the positions of Ψs can be determined.These methods are powerful and precise, but the original protocol has become obsolete due to a couple of factors.First, the adapters utilized in the initial Pseudo-seq are no longer compatible with current sequencers.Consequently, we substituted these adapters with a new set commonly employed in eCLIP [18].Secondly, also based on eCLIP technical advances, adaptor ligation demonstrates higher efficiency than circularization [18].Hence, we replaced the DNA circularization step with adapter ligation.With this revised Pseudo-seq technique, transcriptome-wide Ψs can be detected more efficiently in less-abundant mRNA.

Procedure
We describe below the step-by-step procedure for performing the updated Pseudo-seq using HEK293T cells with modifications/improvements implemented at various steps.Although this protocol follows the workflow of the original Pseudo-seq method [19], the data generated using this protocol is different and we will show the data analysis method in detail.Store all materials at 4 °C except for SDS-containing buffers and perform the procedure on ice.Room temperature is defined as 22 °C throughout this protocol.All washing steps throughout the protocol are performed with a volume of 1 mL unless stated differently.

A. Cell culture, transfection, and RNA collection
1. HEK293 cells were cultivated in DMEM medium with 10% FBS.Cells grown for under 20 passages are desired, but older cells can be used as long as they are of normal morphology.

Cell passage:
a. Wash the cells with 37 °C PBS once.b.Add trypsin to the dishes and incubate at 37 °C for 3 min.c.Add the FBS-containing medium to the dishes to inactivate trypsin.d.Collect the cells by centrifuging at 300× g for 5 min and then count and split the cells for different purposes.e. Seed the cells at 20% confluency.3. Maintain cells at 37 °C with 5% CO2 and passage every 2-3 days at 80%-100% confluence.4. The day before transfection, seed a certain number of cells in 6-well plates and incubate for 24 h to 80%-90% confluency.Resuspend beads in 1 mL of 75% ethanol and allow beads to sit for 30 s. Place tube on the magnet and allow beads to separate; remove supernatant carefully.Wash a second time with 75% ethanol, allowing beads to sit for 30 s. Place tube on the magnet and allow beads to separate; remove supernatant carefully, getting all residual liquid possible.Allow beads to air-dry for 5 min on magnet.d.Remove tube from magnet and resuspend the beads in 12 μL of ddH2O.Incubate for 5 min at room temperature.e. Place tube on the magnet and allow beads to separate; transfer 10 μL of the supernatant to a new tube.12. Quantitate library on Bioanalyzer and submit for sequencing.13.Sequencing should be performed on an Illumina NextSeq system (a NextSeq 550 was used in this protocol development; single-end 150-nucleotide read length is sufficient).

Data analysis
Filter and map reads, catalog 3′ ends: Once the raw sequencing data have been acquired, the adaptor sequences must first be removed.Several free, publicly available software tools can be used (e.g., Cutadapt [20]).The 3′ adaptor contains a unique molecular identifier (UMI) that allows duplicate reads arising from PCR overamplification to be removed.We used UMI-tools [21] for this purpose.Next, the reads are mapped back to the genome using any of the free available alignment tools; we used STAR aligner [22].The final preprocessing step is to extract the coordinates of the 3′ end of each read and determine the density of 3′ ends at each nucleotide position within the genome.This can be accomplished using the genomeCoverageBed function of Bedtools [23].
A separate bedgraph should be produced for each strand in each sample.
Identify putative strong-stop peaks, assign to exons, measure background in surrounding region: In order to eliminate 3′ ends generated by random reverse-transcriptase termination, we filter the genome-wide bedgraphs to require a minimum of 10 reads to identify putative RT-stop sites (peaks).After filtering, the peaks are then assigned to exons of genes using the intersectBed function of Bedtools [23].A reference transcriptome of choice can be used for gene/exon assignment, for example GENCODE [24].In the next step, the background signal for reverse transcriptase termination sites must be determined, in order to identify peaks that are of statistically significant enrichment.With the peak coordinate in the center position, a 100-nucleotide window is generated surrounding the peak within its assigned exon.If the window reaches the ends of the exon on either side, it is ended so as not to include intronic sequence, which is depleted of reads, in the window.Thus, some windows may be shorter than 100 nucleotides.The 3′ end read depth at every nucleotide of each window was determined using the coverage function of Bedtools [23] with the -d option and using the previously created bedgraphs as input.

Published: May 05, 2024
The read depth at the peak position and the distribution of per-nucleotide 3′ end read depth within the window for that peak were used in hypothesis testing as described below.

Identification of high-confidence
Ψs: To test each putative reverse-transcription strong-stop site for significance above the background of random terminations, the read depths at each nucleotide position within the window surrounding each peak position are fitted with a Poisson distribution.The Poisson distribution variable mu is derived using a maximum-likelihood estimation.Then, the read count at the peak position is tested against the null hypothesis that it was sampled from the same Poisson distribution found in the surrounding window, in order to derive a p-value.A false-discovery rate (FDR) is then estimated using the Benjamini-Hochberg procedure; we set the threshold for a positive at 0.05.Finally, peaks that are significant in the CMC+ sample but not in the CMCsample and that have a genomic "T" base at the position directly 3′ of the putative RT-stop allow us to assign the adjacent "T" as a high-confidence pseudouridylation site.

Validation of protocol
This revised protocol was used in a recent publication to detect transcriptome-wide Ψs to confirm that the exogenous gRNA has the specificity and does not show the off-target effects [14].
The revised Pseudo-seq identified a total of 1,370 Ψs in polyadenylated transcripts, which were significant in two independent replicates, and another 3,979 that met the statistical threshold in a single replicate.Note that the numbers of sites identifiable in any specific experiment may depend on many factors, such as the species and cell-type in which the experiment is conducted and the sequencing read depth.Figure 4 shows that there were no significant differences in the magnitude of the Ψ stop signals between the two samples of mRNA isolated from cells before and after (or with and without) transfection of a β-thalassemia PTC-specific gRNA, suggesting that our approach has no significant off-target problem.Table 1 shows that even mRNAs that have a similar sequence to the target of the gRNA did not cause a significant difference between the "minus gRNA" and "plus gRNA" samples.

General notes
1.The 3′ DNA linker (rand103Tr3) is with 10 nt unique molecular identifier (UMI), denoted as NNNNNNNNNN.2. For ethanol precipitations, make sure you do not use ammonium acetate, as the ammonium ions can carry over and are potent inhibitors of T4 PNK.Use sodium acetate that has been pH-adjusted to 5.5-6.Unadjusted 3 M NaOAc is basic and will result in alkaline hydrolysis of your RNA! 3. Allow shattered RNA solution to cool off before adding enzyme.4. N-cyclohexyl-N0-(2-morpholinoethyl)carbodiimide metho-p-toluenesulfonate (CMC) should be prepared freshly at 0.5 M (212 mg/mL) in BEU buffer.CMC is sometimes described as 1-Cyclohexyl-3-(2-morpholinoethyl)carbodiimide Metho-p-toluenesulfonate. 5. Do not turn up the voltage too quickly while prewarming the denaturing polyacrylamide gel, as the plates may crack.6.To make the density of the sample higher (to load the sample on the well easily), adding 1.2-1.5 times of 2× loading dye would help.7.According to the original Pseudo-seq protocol [19]: "It is very important that the RT primer be gel-purified to ensure that it is a uniform length, allowing robust separation of truncated from full-length cDNAs.Gel-purification should be performed in house, as gel-purified primers obtained commercially can be heterogeneous."8.It may be necessary to empirically determine whether the Ct values obtained from a specific real-time qPCR protocol yield values that give the correct amplification range.Use the Bioanalyzer quantitation to zero in on the correct cycle number for subsequent experiments.

5 . 6 .
Transfect the plasmids into HEK293T by PEI MAX 40000: a. Mix 150 μL of Opti-MEM and 10 μL of PEI and incubate for 5 min.b.Add 2 μg of gRNA plasmid and incubate for another 15 min.c.Add the transfection mixture directly to the cells.Collect the cells at 24-48 h after transfection.7. Collect total RNA from one well of the 6-well plate with TRIzol reagent.8. Add 1 mL of TRIzol and collect the cell lysate with a cell scraper.9. Let the sample stand for 5 min at room temperature.10.Add 200 μL of chloroform.Published: May 05, 2024Resuspend beads in 1 mL of 75% ethanol and transfer to a new tube.Place the tube on the magnet and allow the beads to separate; remove the supernatant carefully.Wash twice more with 75% ethanol, allowing beads to sit for 30 s each time.Place the tube on the magnet and allow the beads to separate; remove the supernatant carefully, getting all residual liquid possible.Allow beads to air-dry for 5 min.d.Elute RNA: Resuspend beads in 27 μL of 10 mM Tris-HCl pH 7.5 and incubate for 5 min at room temperature.Place the tube on the magnet and allow beads to separate; then, transfer 25 μL of sample to a new tube Note: At this point, protocol can be paused overnight, and the sample can be stored at -20 °C.9. qPCR to quantify cDNA (in order to determine how many PCR cycles to use).Prepare 9 μL of qPCR master mix, mixing per sample: 5 μL of SYBR Select Master Mix 3.6 μL of ddH2O 0.4 μL of qPCR primer mix (10 μM each D5 and D7 primers mixed together) Add 1 μL of 1:10 diluted (in ddH2O) cDNA to each well of a 384-well qPCR plate.Mix master mix, add 9 μL to each well, and pipette to mix on ice.b.Run qPCR according to standard procedure.As a starting point for the final PCR, use 3 cycles less than the Ct of the 1:10 diluted sample (Note 8) (sample results are shown in Figure3).

Figure 3 . 15 Published:
Figure 3. Sample qPCR results for cDNA quantification.A. Amplification plot.Threshold is automatically defined by software for Ct calculation and lined in bold.B. Ct plot.Three technical replicates for each sample.

Figure 4 . 17 Published:
Figure 4. Reverse transcription (RT)-stop peak height over background is highly similar in control and gRNA-transfected cells [14].Transcriptome-wide Ψ mapping was carried out.HEK293T cells were transfected with the plasmid containing the β-thalassemia PTC-specific gRNA gene or left untransfected (control).mRNA was recovered and the Pseudo-seq libraries were constructed (see Materials and Methods).Ψs were identified and

Published: May 05, 2024 Table 1 .
Targeted pseudouridylation has no significant off-target effects[14].From the set of peak positions genome-wide, those with a minimal match to the gRNA target sequence (ACCΨNGA) were extracted.This yielded 22 peaks.The gene and gene position (CDS, 5′UTR, or 3′UTR) are shown in columns 1 and 2. The extended surrounding nucleotides were then identified and, as shown in column 3, matches to the gRNA target sequence are shown in bold capital letters, mismatches in lowercase, and the putative Ψ genomic DNA is shown as a bold red T. In column 4 the number of contiguous matching nucleotides (without counting unpaired ΨN) is shown.Columns 5, 6, 7, and 8 provide the p-values of the enrichment of peak read counts over background for the control and gRNA-treated samples, respectively.Statistical significance was considered for peaks for which the CMC+, but not the CMC-sample, reached the significance cutoff.Peaks that are significant in the gRNA-treated CMC+, but not CMC-or CMC+ control sample, are shaded pink.

May 05, 2024 Reagent Final concentration Quantity or Volume
*Note: Filter sterilize (do not autoclave).*Note: Make stock solutions of EDTA, SDS, and the dyes.Mix into 95% formamide at the time of use.Discard any leftover.*Note: TBE is used at a final 0.5× concentration.Dilute 8 times before use.Usually, pH adjustment is not necessary.